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A square root approach is considered for the problem of accounting for model noise in the forecast 
step of the ensemble Kalman filter (EnKF) and related algorithms. The primary aim is to replace the 
method of simulated, pseudo-random, additive noise so as to eliminate the associated sampling errors. 
The core method is based on the analysis step of ensemble square root filters, and consists in the 
deterministic computation of a transform matrix. The theoretical advantages regarding dynamical 
consistency are surveyed, applying equally well to the square root method in the analysis step. A 
fundamental problem due to the limited size of the ensemble subspace is discussed, and novel solutions 
that complement the core method are suggested and studied. Benchmarks from twin experiments 
with simple, low-order dynamics indicate improved performance over standard approaches such as 
additive, simulated noise and multiplicative inflation. 


1 Introduction 

The EnKF is a popular method for doing data assimi¬ 
lation (DA) in the geosciences. This study is concerned 
with the treatment of model noise in the EnKF forecast 
step. 

1.1 Relevance and scope 

While uncertainty quantification is an important end prod¬ 
uct of any estimation procedure, it is paramount in DA 
due to the sequentiality and the need to correctly weight 
the observations at the next time step. The two main 
sources of uncertainty in a forecast are the initial con¬ 


ditions and model error (Slingo and Palmer, 2011). Ac¬ 
counting for model error is therefore essential in DA. 

Model error, the discrepancy between nature and com¬ 
putational model, can be due to incomplete understand¬ 
ing, linearisation, truncation, sub-grid-scale processes, and 


numerical imprecision (Li et al., 200£; Nicolis, 2004). For 
the purposes of DA, however, model error is frequently de¬ 
scribed as a stochastic, additive, stationary, zero-centred, 
spatially correlated, Gaussian white noise process. This is 
highly unrealistic, yet defensible in view of the multitude 
of unknown error sources, th e central limit theorem, and 
tractability ( Jazwinski , 1970 , §3.8). Another issue is that 
the size and complexity of geoscientific models makes it 
infeasible to estimate the model error statistics to a high 
degree of detail and accuracy, nece s sitati ng further reduc¬ 
tion of its parameterisations (Dec, |1995|). 
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The model error in this study adheres to all of the 
above assumptions. This, however, renders it indistin¬ 
guishable from a noise process, even from our omniscient 
point of view. Thus, this study effectively also pertains 
to natural noises not generally classified as model error, 
such as inherent stochasticity (e.g. quantum mechanics) 
and stochastic, external forcings (e.g. cosmic microwave 
radiation). Therefore, while model error remains the pri¬ 
mary motivation, model noise is henceforth the designa¬ 
tion most used. It is left to future studies to recuperate 
more generality by scaling back on the assumptions. 

Several studies in the literature are concerned with 
the estimation of model error, as well as its treatment in 
a DA scheme (Daley , 1992|; Mitchell and Carrassi , 2014 ; 


Zupanski and Zupanski , 2006 ). The scope of this study is 
more restricted, addressing the treatment only. To that 
end, it is functional to assume that the noise statistics, 
namely the mean and covariance, are perfectly known. 
This unrealistic assumption is therefore made, allowing 
us to focus solely on the problem of incorporating or ac¬ 
counting for model noise in the EnKF. 

1.2 Model noise treatment in the EnKF 

From its inception, the EnKF has explicitly considered 
model noise and accounted for it in a Monte-Carlo way: 
adding s imulated , pseu do-random noise to the state reali¬ 
sations (Evensen, 1994). A popular alternative technique 
is multiplicative inflation, where the spread of the ensem¬ 
ble is increased by some “inflation factor”. Several com- 
parisons of these techniques exist in th e literature (e.g. 
Deng et al, 2011; Hamill and Whitaker, 2005; Whitaker 


et al., 2008). 
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et al.[7 2005; Hunt et al., [2004 ; Whitaker et al., 2004). 


Quite frequently, however, model noise is not explic- where E(.) and Var(.) are the (multivariate) expectation 
itly accounted for, but treated simultaneously with other and variance operators. In the linear-Gaussian case, these 
system errors, notably sampling error and errors in the characterise p(aV | y 1:t_1 ) andp(x t \ y 1:t ), and are given, 
specification of the noise statistics (Anderson, 2009; [Houtckamw :ursively in time for sequentially increasing indices, t, 

by the Kalman filter equations. 

The EnKF is an algorithm to approximately sample 
ensembles, Xi-.n = {x n \ n = 1 : TV}, from these distribu¬ 
tions. Note that the positive integer TV is used to denote 
ensemble size, while m and p have been used to denote 
state and observation vector lengths. For convenience, all 
of the state realisations are assembled into the “ensemble 
matrix”: 


This is because (a) inflation can be used to compensate 
for these system errors too, and (b) tuning separate in¬ 
flation factors seems wasteful or even infeasible. Nev¬ 
ertheless, even in realistic settings, it can be rewarding 
to tre at model er ror explicitly. For example, Whitaker 
and H; mill ( [2012 ) show evidence that, in the presence of 
multiple sources of error, a tuned combination of a multi¬ 
plicative technique and additive noise is superior to either 
technique used alone. 

Section discusses the EnKF model noise incorpora¬ 
tion techniques most relevant to this manuscript. How¬ 
ever, the scope of this manuscript is not to provide a full 
comparison of all of the alternative under all relevant cir¬ 
cumstances, but to focus on the square root approach. 
Techniques not considered any further here inclu de using 
more complicated stochastic parameterisations (Arnold 
et al., 2013; Berry and Harlim, 2014), physics-based forc- 


ings such as stochastic kinetic energy backscatter (Shutts 


2005), relaxation (Zhang et al., 2004), and boundary con¬ 


dition forcings. 

1.3 Framework 

Suppose the state and observation, 
respectively, are generated by: 


„t+1 


= f(x t ) + q t 


y l = Hcc* 


£C 4 <E R m and y* £ R p 

* = 0 , 1 ,..., ( 1 ) 

t = 1,2,..., (2) 


are 


-AA(O.Q), r* ~ A/"(0, R), 


x 


AOy 0 ,P°). (3) 


The observation operator, H £ R pxm , has been assumed 
linear because that is how it will effectively be treated 


anyway (through the augmentation trick of e.g. Ander- 
The parameter fi° £ 


son 


2001 ). 


,m is assumed known, 
as are the symmetric, positive-definite (SPD) covariance 
matrices P",Q £ R m and R £ R p . Generalisation to 
time-dependent Q, R, /, and H is straightforward. 

Consider p{x l \ y 1:t ), the Bayesian probability distri¬ 
bution of x * conditioned on all of the previous observa¬ 
tions, y 1:t , where the colon indicates an integer sequence. 
The recursive filtering process is usually broken into two 
steps: the forecast step, whose output is denoted by the 
superscript /, and the analysis step, whose output is de¬ 
noted using the superscript a. Accordingly, the first and 
second moments of the distributions are denoted 


x f =E (aV* -1 ), 


X 


= \y ) 


P f =Yar(x t \ y 1 *- 1 ). 
P“ = Yar(x t \y 1:t ), 


(4) 

(5) 


E = [®i, 


xn] 


A related matrix is that of the “anomalies”: 


A = E(Ijv — fli) = E(Ijv — 11 t /TV), 


( 6 ) 


(7) 


where 1 £ 




is the column vector of ones, 1 T is its 


transpose, and the matrix I n is the TV-by-TV identity. The 
conventional estimators serve as ensemble counterparts 
to the exact first and second order moments of eqns. (|J) 
and (§), 


x? — 


1 


= n eH 


-J 


1 


/a/ j 


TV- 1 
1 

TV- T 


A J A 


A"A“ 


( 8 ) 

(9) 


where the Gaussian white noise processes { q 4 | t = 0,1,...} 
and {r* | t = 1,2,...}, and the initial condition, a; 0 , 
specified by: 


where, again, the superscripts indicate the conditioning. 
Furthermore, A (without any superscript) is henceforth 
used to refer to the anomalies at an intermediate stage 
in the forecast step, before model noise incorporation. 
In summary, the superscript usage of the EnKF cycle is 
illustrated by 

Forecast step 


. a Model integration, 


» A 


Model noise A f Analysis 

- > A - 


eqn. 


incorporation 


eqns. (17,21) 

v- 

Analysis step 


• A" 


Although the first A" of the diagram is associated with 
the time step before that of A, A , and the latter A“, this 
ambiguity becomes moot by focusing on the analysis step 
and the forecast step separately. 

1.4 Layout 

The proposed methods to account for model noise builds 
on the square root method of the analysis step, which is 
described in section |[ The core of the proposed methods 
is then set forth in section |j. Properties of both methods 
are analysed in section |[ Alternative techniques, against 
which the proposed method is compared, are outlined in 
section |^. Based on these alternatives, section || intro¬ 
duces methods to account for the residual noise result¬ 
ing from the core method. It therefore connects to, and 
completes, section [| The set-up and results of numeri¬ 
cal experiments are given in section [ 7 ] and section ||. A 
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summary is provided, along with final discussions, in sec¬ 
tion The appendices provide additional details on the 
properties of the proposed square root methods. 


denote the “normalised” anomalies and mean innovation 
of the ensemble of observations. Recalling eqn. (pi) it can 
then be shown that eqns. © to (Jl2|) are satisfied if: 


2 The square root method in the 
analysis step 

Before introducing the square root method for the EnKF 
forecast step, which accounts for model noise, we here 
briefly discuss the square root method in the analysis 
step. 

2.1 Motivation 

It is desirable that P ^ = p a // anc [ x a/f _ x a/f through¬ 
out the DA process. This means that the Kalman filter 
equations, with the ensemble estimates swapped in, 


K = P / H t (HP / H t + R)- 1 , 

( 10 ) 

x a = x-f + K[y Hxf ], 

( 11 ) 

P° = [l ro -KH]P f , 

( 12 ) 


x a = x? + A 1 G a S T s , (17) 

A“A aT = A / G q A /T , (18) 

where the two forms of G“, 

G a = Ijv — S T (SS T + l P ) -1 S (19) 

= (S T S + Itv) -1 , (20) 


are li nked through the Woodbury identity (e.g. |Wunsch| , 
2006). Therefore, if A“ is computed by 


A“ = A f T a 


( 21 ) 


with T“ being a matrix square root of G“, then A“ sat¬ 
isfies eqn. ( 0 ) exactly. Moreover, “square root update” 
is henceforth the term used to refer to any update of the 
anomalies through the right-multiplication of a transform 
matrix, as in eqn. ©• The ensemble is obtained by re¬ 
combining the anomalies and the mean: 


should be satisfied by E“ from the analysis update. 


Let D 0 bs £ 


N 


be a matrix whose columns are 


drawn independently from 7V(0, R). Unfortunately, the 


perturbed observations analysis update (Burgers et al 
1998D, 


E a = E / + R| ? /l T + D ohll -HE / } , (13) 


K jyl -i- u» obs 
only yields the intended covariance, eqn. (|l2|), on average: 

E(P“) = [I m - KH]P ; , (14) 

where the expectation, E, is taken with respect to D 0 b s - 


2.2 Method 

On the other hand, the square root analysis update satis¬ 
fies eqmjjl^)_exactly. Originally introduced to the EnKF 
by B ishop et al. ( 2001 ), the square root analysis approach 
was soon conn e cted to classic square root Kalman filters 
( Tippett et al. , 2003 ). But while the primary intention of 
classic square root Kalman filters was to improve on the 
numerical stability of the Kalman filter (Anderson and 
Moore, 1979), the main purpose of the square root EnKF 
was rather to eliminate the stochasticity and the accom¬ 
panying sampling errors of the perturbed-observations 
analysis update 0). 

Assume that p < to, or that R is diagonal, or that 
R -1 / 2 is readily computed. Then, both for notational 


and computational (Hunt et al., 2007) simplicity, let 


S = R _1/2 (HA / )/VF^l £ 

s = R~ 1//2 [y — Hx^]/y/N — 1 £ 


axJV , (15) 
(16) 


( 22 ) 


Equation ( |2C| ) implies that G a is SPD. The matrix T is 
a square root of G a if it satisfies 


E“ = x a t 1 + A 


2.3 The symmetric square root 


T a T“ 


(23) 


However, by substitution into eqn. (0) it is clear that 
T“n is also a square root of G a , for any orthogonal matrix 
fi. There are therefore infinitely many square roots. Nev¬ 
ertheless, some have properties that make them unique. 
For example, the Cholesky factor is unique as the only 
triangular square root with positive diagonal entries. 

Here, however, the square root of most interest is the 
symmetric one, T“ = VZ 1/,2 V T . Here, VZV 1 = G a is 
an eigendecomposition of G°, and Z 1 1 is defined as the 


entry-wise positive square root of Z (Horn and Johnson, 
2013, Th. 7.2.6). Its existence follows from the spectral 


theorem, and its uniqueness from that of the eigendecom¬ 
position. Note its distinction by the s subscript. 

It has been gradually discovered that the symmetric 
square root choice has several advantageous properties 
for its use in eqn. (j2l|), one of which is that the it does 
not affect the ensemble mean (e.g. Evensen, [2009; Wang 


and Bishop, 2003), which is updated by eqn. (|17) apart 
from the anomalies. Further advantages are surveyed in 
section |[ providing strong justification for choosing the 
symmetric square root, and strong motivation to extend 
the square root approach to the forecast step. 
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3 The square root method in the 
forecast step 

Section reviewed the square root update method for the 
analysis step of the EnKF. In view of its improvements 
over the Monte-Carlo method, it is expected that a simi¬ 
lar scheme for incorporating the model noise into the fore¬ 
cast ensemble, , would be beneficial. Section 
such a scheme: Sqrt-Core. First, however, 
illuminates the motivation: forecast step sampling error. 

3.1 Forecast sampling errors in the clas¬ 
sic EnKF 

Assume linear dynamics, f : x >-> /( x) = Fee, for ease of 
illustration. The Monte-Carlo simulation of eqn. (jl|) can 
be written 


3.2 derives 


section 


3.1 


E f = FE a + D , 


(24) 


where the columns of D are drawn from Af( 0, Q) by 


D = Q 1/2 Z, (25) 

where =. = [£i, ... £ n , ... £jv] , and each £„ is in¬ 

dependently drawn from A/"(0, l m ). Note that different 
choices of the square root, say Q 1 ,/2 and Q 1/2 n, yield 
equally-distributed random variables, Q 1 ^ 2 ^ and q 1 /2 q$. 
Therefore the choice does not matter, and is left unspec¬ 
ified. It is typical to eliminate sampling error of the first 
order by centering the model noise perturbations so that 
Dl = 0. This introduces dependence between the sam¬ 
ples and reduces the variance. The latter is compensated 
for by rescaling by a factor of yjN/{N — 1). The result 
is that 

P f = FP F t + Q (26) 

+(Q Q) (fa u d t + D(FA“) t ) , 

as per eqn. (||), where Q = (N — 1) _1 DD 1 . But, for the 
same reasons as for the analysis step, ideally: 

P f = FP a F T + Q . (27) 

Thus, the second line of eqn. (^g|) constitutes a stochastic 
discrepancy from the desired relations (|27j). 

3.2 The square root method for model 
noise — Sqrt-Core 

As illustrated in section |l.3| , define A as the anomalies of 
the propagated ensemble before noise incorporation: 

A = /(E a )(ljv — ll T /iV) , (28) 


where / is applied column-wise to E“. Then the desired 
relation (^?|) is satisfied if A 1 satisfies: 

A / A /T = AA r + (N - 1)Q . (29) 

However, A^ can only have N columns. Thus, the prob¬ 
lem of finding an A 1 that satisfies eqn. (^9|) is ill-posed, 
since the right hand side of eqn. (|2^) is of rank m for 
arbitrary, full-rank Q, while the left hand side is of rank 
N or less. 

Therefore, let A + be the Moore-Penrose pseudoinverse 
of A, denote rh = AA~ the orthogonal projector onto 
the column space of A, and define Q = IKOIIa the “two- 
sided” projection of Q. Note that the orthogonality of the 
projector, 11 a , induces its symmetry. Instead of eqn. (|2^), 
the core square root model noise incorporation method 
proposed here, Sqrt-Core, only aims to satisfy 

A f A fT = AA r + (N - 1)Q . (30) 

By virtue of the projection, eqn. (|3^) can be written as 

G f = \ n + (N-1)A + Q(A + ) t , (31) 

A¥ T = AG f A T . (32) 

Thus, with T f being a square root of G J , the update 

A f = AT f (33) 

accounts for the component of the noise quantified by Q. 
The difference between the right hand sides of eqns. ( p9| ) 
and @), (N-1)[Q-Q\ , is henceforth referred to as the 
“residual noise” covariance matrix. Accounting for it is 
not trivial. This discussion is resumed in section |g]. 

As for the analysis step, we choose to use the sym¬ 
metric square root, T{, of G f . Note that two SVDs are 
required to perform this step: one to calculate A - , and 
one to calculate the symmetric square root of . Fortu¬ 
nately, both are relatively computationally inexpensive, 
needing only to calculate N— 1 singular values and vec¬ 
tors. For later use, define the square root “additive equiv¬ 
alent”: 


D = A ; A = A[T{ - I 


N 


(34) 


3.3 Preservation of the mean 

The square root update is a deterministic scheme that 
satisfies the covariance update relations exactly (in the 
space of A). But in updating the anomalies, the mean 
should remain the same. For Sqrt -Core, this c an be 
shown to hold true in the same way as Livings et al. ( 2008| ) 
did for the analysis step, with the addition of eqn. (|36|). 


Theorem 1 
If A 1 — AT{, 


- Mean preservation. 
then 


A f t = 0. 


(35) 
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I. e. the symmetric square root choice for the model noise 
transform matrix preserves the ensemble mean. 

Proof. For any matrix A, 

A+=A t (AA t )+, (36) 

( Ben-Israel and Grevill4 |2003| , §1.6). Thus, 

G f t = 1 + (N - 1)A + Q(AA t )+A1 = 1, (37) 

as per eqn. (^8|). But the eigenvectors of the square of 
a diagonalisable matrix are the same as for the original 
matrix, with squared eigenvalues. Thus eqn. (^) implies 
A J 1 = AT{l = Al = 0. □ 


4 Dynamical consistency of square 
root updates 

Many dynamical systems embody “balances” or constraints 
on the state space ( can Leeuwen , 2009 ). For reasons of 
complexity and efficiency these concerns are often not 
encoded in the prior ( Wang et al. , 2011 ). They are there¬ 
fore not considered by the statistical updates, resulting in 
state realisations that are inadmissible because of a lack 
of dynamical consistency or physical feasibility. Typi¬ 
cal consequence of breaking such constraints include un¬ 
bounded growth (“bl ow up”), exemp li fied by the quasi- 
geostrophic model of Sakov and Okc ( 2008a ), or failure 
of the model to converge, exemplified by reservoir simu¬ 
lators (Chen and Oliver, 2013| ). 

This section provides a formal review of the properties 
of the square root update as regards dynamical consis¬ 
tency, presenting theoretical support for the square root 
method. The discussion concerns any square root update, 
and is therefore relevant for the square root method in the 
analysis step as well as for Sqrt-Core. 


4.1 Affine subspace confinement 

The fact that the square root update A i-> AT is a right- 
multiplication means that each column of the updated 
anomalies is a linear combination of the original anoma¬ 
lies. On the other hand, T itself depends on A. In recog¬ 
nition of these two aspects, Evensen ( [2003 ) called such 
an update a “weakly nonlinear combination”. However, 
our preference is to describe the update as confined to 
the affine subspace of the original ensemble, that is the 
affine space x + span(A). 


4.2 Satisfying equality constraints 

It seems reasonable to assume that the updated ensemble, 
being in the space of the original one, stands a fair chance 
of being dynamically consistent. However, if consistency 
can be described as equality constraints, then discussions 
thereof can be made much more formal and specific, as 
is the purpose of this subsection. In so doing, it uncovers 


a couple of interesting, hitherto unnoticed advantage of 
the symmetric square root choice. 

Suppose the original ensemble, or E, satisfies 

Cx n = d for all n = 1 : N , i.e. 

CE = dt T . (38) 


One example is conservation of mass, in which case the 
state, x, would contain grid-block densities, while the 
constraint coefficients, C, would be a row vector of the 
corresponding volumes, and d would be the total mass. 


Anot her example is geostrophic balance (e.g. Hoang et al 
20051 ), in which case x would hold horizontal velocity 


components and sea surface heights, while C would con¬ 
catenate the identity and a discretised horizontal differ¬ 
entiation operator, and d would be zero. 

The constraints @) should hold also after the update. 
Visibly, if d is zero, any right-multiplication of E, i.e. 
any combination of its columns, will also satisfy the con¬ 
straints. This provides formal justification for the propo¬ 
sition of Evensen ( 2003| ), that the “linearity” of the EnKF 
update implicitly ensures respecting linear constraints. 

One can also write 


Cx = d , 
CA = 01 T . 


(39) 

(40) 


implying (|3^) provided E = S1 T +A holds. Equations ( |39| ) 
and (^o|) show that the ensemble mean and anomalies can 
be thought of as particular and homogeneous solutions to 
the constraints. They also indicate that in a square root 
update, even if d is not zero, one only needs to ensure 
that the mean constraints are satisfied, because the ho¬ 


mogeneity of eqn. (40) means that any right-multiplying 
update to A will satisfy the anomaly constraints. How¬ 
ever, as mentioned above, unless it preserves the mean, 
it might perturb eqn. (|39|). A corollary of Theorem [1] is 
therefore that the symmetric choice for the square root 
update also satisfies inhomogeneous constraints. 

Finally, in the case of nonlinear constraints, e.g. c 6’{x n ) 
d, truncating the Taylor expansion of ^ yields 


CA « [d — x)]\ 1 , 


(41) 


where C = |^(x). Contrary to eqn. (f!o|), the approxi¬ 
mate constraints of eqn. are not homogeneous, and 
therefore not satisfied by any right-multiplying update. 
Again, however, by Theorem [I], the symmetric square 
root appears an advantageous choice, because it has 1 
as an eigenvector with eigenvalue 1, and therefore satis¬ 
fies the (approximate) constraints. 


4.3 Optimality of the symmetric choice 

A number of related properties on the optimality of the 
symmetric square root exist scattered in the literature. 
However, to the best of our knowledge, these have yet to 
be reunited into a unified discussion. Similarly, consider¬ 
ations on their implications on DA have so far not been 
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collected. These are the aims of this subsection. 


strain the creation of dynamical inconsistencies. 


Theorem 2 Minimal ensemble displacement. 

Consider the ensemble anomalies A with ensemble covari¬ 
ance matrix P. and let q n be column n of D = AT —A: the 
displacement of the n-th anomaly through a square root 
update. The symmetric square root, T s , minimises 


J(T) = jV^lE || 9« || i (42) 

n 

= trace ([AT - A] T (AA r )+ [AT - A]) (43) 


I AT 


among all T £ 
some SPD matrix G. Equation ( 


such that ATT 1 A 1 = AGA 1 , for 


coincides with eqn. (42) 


if P exists, but is also valid if not. 


Theorem |2| was proven by Ott et al. ( 2004 ), and later 
restated by Hunt et ak| ( [2007 ) as the constrained opti¬ 
mum of the Frobenius norm of [T — I jv] - Another inter¬ 
esting and desirable property of the symmetric square 
root is the fact that the updated ensemble members are 
all equally lik e ly realisations of the est imated posterior 
( McLay et ah , 200S| ; Wang et al. , 2004 ). More recently, 
the choice of mapping between the original and the up¬ 
dated ensembles has been formulated through optimal 
transport theory (Cotter and Reich, 2012; Oliver, 2014). 
However, the cost functions therein typically use a differ¬ 
ent weighting on the norm than J(T), in one case yield¬ 
ing an optimum that is the symmetric /e/f-multiplying 
transform matrix - not to be confused with the right- 
multiplying one of Theorem [^. 

Theorem |] and the related properties should bene¬ 
fit the performance of filters employing the square root 
update, whether for the analysis step, the model noise in¬ 
corporation, or both. In part, this is conjectured because 
minimising the displacement of an update means that the 
ensemble cloud should retain some of its shape, and with 
it higher-order, non-Gaussian information, as illustrated 
in Fig. 0. 

A different set of reasons to expect strong perfor¬ 
mance from the symmetric square root choice is that it 
should promote dynamical consistency, particularly re¬ 
garding inequality constraints, such as the inherent pos¬ 
itivity of concentration variables, as well as no n-lin ear 
equality constraints, initially discussed in section 4.2, In 


either case it stands to reason that smaller displacements 
are less likely to break the constraints, and therefore that 
their minimisation should inhibit it. Additionally, it is 
important when using “local analysis” localisation that 
the ensemble is updated similarly at nearby grid points. 
Statistically, this is ensured by employing smoothly de¬ 
caying localisation functions, so that G does not jump too 
mu ch from one grid p oint to the next. But, as pointed out 
by Hunt et ah] ( [2007 ), in order to translate this smooth¬ 
ness to dynamical consistency, it is also crucial that the 
square root is continuous in G. Furthermore, even if G 
does jump from one grid point to the next, it still seems 
plausible that the minimisation of displacement might re- 


5 Alternative approaches 

This section describes the model noise incorporation meth¬ 
ods most relevant methods to this study. Table |l] sum¬ 
marises the methods that will be used in numerical com¬ 
parison experime nts . Add-Q is the classic method de¬ 
tailed in section 3T. Mult-1 and Mult-to are multi¬ 
plicative inflation methods. The rightmost column re¬ 
lates the different methods to each other by succinctly 
expressing the degree to which they satisfy eqn. (29); it 
can also be used as a starting point for their derivation. 
Note that Mult-1 only satisfies one degree of freedom of 
eqn. (^9[), while Mult-to satisfies m degrees, and would 
therefore be expected to perform better in general. It is 
clear that Mult-1 and Mult-to will generally not pro¬ 
vide an exact statistical update, no matter how big N 
is, while Add-Q reproduces all of the moments almost 
surely as N — > oo. By comparison, Sqrt-Core guar¬ 
antees obtaining the correct first two moments for any 
IV > to, but does not guarantee higher order moments. 

Using a large ensemble size, Fig. [I| illustrates the dif¬ 
ferent techniques. Notably, the cloud of Add-Q is clearly 
more dispersed than any of the other methods. Further¬ 
more, in comparison to Mult-to and Mult-1, Sqrt- 
Core significantly skewers the distribution in order to 
satisfy the off-diagonal conditions. 


• None 

• Add-Q 
Mult-1 

• Mult-m 

• Sqrt-Core 


/ /' * 







-3Q.tr- 


Figure 1: Scatter plot of ense mble forecasts with the three- 
dimensional Lorenz-63 system (Lorenz. 1965) using different 
schemes to account for the model noise, which is specified by 
A tQ = diag([36.00, 3.60,1.08]) and makes up approximately 
30% of the total spread of the updated ensembles. Each 
dot corresponds to the “( x,y )” coordinate of one realisation 
among N = 400. 


Continuing from section 1.2, the following details other 
pertinent alternatives, some of them sharing some simi¬ 
larity with the square root methods proposed here. 

One alternative is to resample the ensemble fully from 
Af(0. AA 1 /(N— 1 ) +Q). However, this incurs larger sam¬ 
pling errors than Add-Q, and is more likely to cause dy¬ 
namical inconsistencies. 
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Table 1: Comparison of some model noise incorporation methods. 


Description 

Label 

A' = 

where 

thus satisfying 

Additive, simulated noise 

Add-Q 

A + D 

D is a centred sample from A/"(0,Q) 

E D (eqn. (p^)) 

Scalar inflation 

Mult-1 

AA 

A 2 = trace(P) -1 trace(P -1- Q) 

trace(eqn. (|29|)) 

Multivariate inflation 

MuLT-m 

AA 

A 2 = diag(P) -1 diag(P + Q) 

diag(eqn. (f29|) ) 

Core square root method 

Sqrt-Core 

AT 

T= (l 7V + (V-l)A + QA +T ) s 1/2 

n A (eqn. @)fl A 


Second-order exact sampling (Pham, 2001) attempts 
to sample noise under the restriction that all of the terms 
on the second line of eqn. ([27]) be zero. It requires a 
very large ensemble size (N > 2m), and is therefore typ¬ 
ically not applicable, though recent work indicate that 
this might be circumvented (Hoteit et al., 2015). 

Th e singular evolutive interpolated Kalman (SEIK) 
filter ( Hoteit et al. , 2002 ) has a slightly less primitive 
and intuitive formalism than the EnKF, typically work¬ 
ing with matrices of size m x (N — 1). Moreover, it does 
not have a separate step to deal with model noise, treat¬ 
ing it instead implicitly, as part of the analysis step. This 
lack of modularity has the drawback that the frequency 
of model noise incorporation is not controllable: in case 
of multiple model integration steps between observations, 
the noise should be incorporated at each step in order to 
evolve with the dynamics; under different circumstances, 
skipping the treatment of noise for a few steps can be 
cost efficient (Evensen and van Leeuwen, 1996). Never¬ 
theless, a stand-alone model noise step can be distilled 

from the SEIK algorithm as a whole. Its forecast co- 

-/ 

variance matrix, P , would equal to that of Sqrt-Core: 
n A (P + Q)n A . However, unlike Sqrt-Core, which uses 
the symmetric square root, the SEIK uses random rota¬ 
tion matrices to update the ensemble. Also, the SEIK 
filter uses a “forgetting factor”. Among other system er¬ 
rors, this is intended to account for the r esid ual noise 
covariance, [Q-Q]. As outlined in section [k^, however, 
this factor is not explicitly a function of [Q Q] : it is in¬ 
stead obtained from manual tuning. Moreover, it is only 
applied in the update of the ensemble mean. 

Another method is to include only the N — 1 largest 
eigenvalue c omponents of P+Q, as i n redu ced-rank square 
root filters (Verlaan and Heemink, 1997), and some ver¬ 


sions of the unscented Kalman filter (Chandrasekar et al 


2008). This method can be referred to as T-SVD because 


the update can be effectuated through a truncated SVD 

of [P ' ,Q 1/2 ], where the choices of square roots do not 
matter. It captures more of the total variance than Sqrt- 
Core, but also changes the ensemble subspace. More¬ 
over, it is not clear how to choose the updated ensemble. 
For example, one would suspect dynamical inconsisten¬ 
cies to arise from using the ordered sequence of the trun¬ 
cated SVD. Right-multiplying by random rotation ma¬ 
trices, as in the SEIK, might be a good solution. Or, if 
computed in terms of a Ze/f-multiplying transform matrix, 
the symmetric choice is likely a good one. Building on T- 


SVD, the “partially orthogonal” EnKF and the COFFEE 
algorithm of (Hanea et al., 2007; Heemink et al., 2001) 


also recognise the issue of the residual noise. In contrasts 
with the treatments proposed in this study, these meth¬ 
ods introduce a complementary ensemble to account for 
it. 


6 Improving Sqrt-Core: Account¬ 
ing for the residual noise 

As explained in section |4.l| , Sqrt-Core can only incor¬ 
porate noise components that are in the span (range) of 
A. This leaves a residual noise component unaccounted 
for, orthogonal to the span of A, with [Q — Q] posing as 
its covariance matrix. 

First consider why there is no such residual of R for 
the square root methods in the analysis step: because the 
analysis step subtracts uncertainty, unlike the forecast 
step which adds it. Therefore the presence or absence of 
components of R outside of the span of the observation 
ensemble makes no difference to the analysis covariance 
update because the ensemble effectively already assumes 
zero uncertainty in these directions. 

In the rest of this section the question addressed is 
how to deal with the residual noise. It is assumed that 
Sqrt-Core, eqn. (|3[), has already been performed. The 
techniques proposed thus complement Sqrt-Core, but 
do not themselves possess the beneficial properties of Sqrt- 
Core discussed in section f|. Also, the notation of the 
previous section is reused. Thus, the aim of this section 
is to find an A r £ R mx7V that satisfies, in some limited 
sense 

A j A /T = AA t + (N - 1) [Q - Q]. (44) 

6.1 Complementary, additive sampling — 
Sqrt-Add-Z 

Let Q 1/2 be a any square root of Q, and define 

Q 1/2 = n A Q 1/2 , (45) 

Z = (l m -n A )Q 1/2 , (46) 

the orthogonal projection of Q 1 ^ 2 onto the column space 
of A, and the complement, respectively. 

A first suggestion to account for the residual noise is to 
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use one of the techniques of section ||, with [Q-Q] taking 
the place of the full Q in their formulae. In particular, 
with Add-Q in mind, the fact that 

Q 1 /2 = qi/2 +z ( 47 ) 

motivates sampling the residual noise using Z. That is, in 
addition to D of Sqrt-Core, which accounts for Q, one 
also adds D = ZE to the ensemble, where the columns of 
E are drawn independently from A/”(0,l m ). We call this 
technique Sqrt-Add-Z. 

Note that Q 1 / 2 , defined by eqn. (fill), is a square root 
of Q. By contrast, multiplying eqn. ( fl^ ) with its own 
transpose yields 

ZZ T = [Q - Q] - Q 1/2 Z T - ZQ T / 2 , (48) 

and reveals that Z is not a square root of [Q — Q] . There¬ 
fore, with expectation over E, Sqrt-Add-Z does not re¬ 
spect E(eqn. ([h|)), as one would hope. 

Thus, Sqrt-Add-Z has a bias equal to the cross term 
sum, Q!/2Z T + ZQ T / 2 =]Q-Q]-ZZ t . Notwithstanding 
this problem, Corollary |l| of appendix A shows that the 
cross terms sum, has a spectrum symmetric around 0, and 
thus zero trace. To some extent, this exonerates Sqrt- 
Add-Z, since it means that the expected total variance 
is unbiased. 

6.2 The underlying problem: replacing 
a single draw with two independent 
draws 

Since any element of Q is smaller than the correspond¬ 
ing element in Q, either one of the multiplicative infla¬ 
tion techniques can be applied to account for [Q-Q] 
without second thoughts. Using Mult-1 would satisfy 
trace(eqn. (Q)), while Mult-to would satisfy diag(eqn. ( ff4| ) 
However, the problem highlighted for Sqrt-Add-Z is not 
just a technicality. In fact, as shown in appendix A sec¬ 
tion M [Q Q] has negative eigenvalues because of the 
cross terms. It is therefore not a valid covariance ma¬ 
trix in the sense that it has no real square root: samples 
with covariance [Q-Q] will necessarily be complex num¬ 
bers; this would generally be physically unrealisable and 
therefore inadmissible. This underlying problem seems to 
question the validity of the whole approach of splitting up 
Q and dealing with the parts Q and [Q — Q] separately. 

Let use emphasise the word independently, because 
that is, to a first approximation, what we are attempting 
to do: replacing a single draw from Q by one from Q plus 
another, independent draw from [Q-Q]. Rather than 
considering N anomalies, let us now focus on a single one, 
and drop the n index. Define the two random variables, 

q = 0 1/2 * + Z*, (49) 

<^=Q 1 / 2 £+Z£, (50) 

where £ are random variables independently drawn 


from A/"(0,l m ). By eqn. (|47|), and design, q can be iden¬ 
tified with any of the columns of D of eqn. ( |25| ) and, 
furthermore, Var(q) = Q. On the other hand, while q 
originates in a single random draw, q ^ is the sum of two 
independent draws. 

The dependence between the terms of q, and the lack 
thereof for q^, yields the following discrepancy between 
the variances: 

Var(q) = Q + ZZ T + Q 1/2 Z T + ZQ T / 2 , (51) 

Var(q JI ) =Q + ZZ t . (52) 

Formally, this is the same problem that was identified 
with eqn. (fl8|), namely that of finding a real square root 
of [Q — Q], or eliminating the cross terms. But eqns. © 
and ( |52| ) show that the problem arises from the more pri¬ 
mal problem of trying to emulate q by q ^. Vice versa, 
Qi/2 Z T = 0 would imply that the ostentatiously depen¬ 
dent terms, Q 1 / 2 ^ and Z£, are independent, and thus q^ 
is emulated by q. 

6.3 Reintroducing dependence — Sqrt-Dep 

As already noted, though, making the cross terms zero is 
not possible for general A and Q. However, the perspec¬ 
tive of q and q 11 hints at another approach: reintroduc¬ 
ing dependence between the draws. In this section we will 
reintroduce dependence by making the residual sampling 
depend on the square root equivalent, D of eqn. (pi[). 

The trouble with the cross terms is that Q “gets in the 
way” between IR and (\ m — 11a), whose product would 
otherwise be zero. Although less ambitious than emulat¬ 
ing q with q 11 , it is possible to emulate a single draw 
from A/"[0, l m ], e.g. £, with two independent draws: 

£ JL = n|+(l ro -n)|, (53) 

where, as before, £ and £ are independent random vari¬ 
ables with law A/"(0,l m ), and n is some orthogonal pro¬ 
jection matrix. Then, as the cross terms cancel, 

nn T + (i m -n)(i m -n) T = i m , (54) 

and thus Var(^ JL ) = Var(£). 

We can take advantage of this emulation possibility 
by choosing fl as the orthogonal projector onto the rows 


of Q 1 / 2 . Instead of eqn. (|49|), redefine q as 

q = Q 1/2 |^ . (55) 

Then, since Var(^ iL ) = l m , 

Var(q) = Q 1/2 l m Q T/2 = Q , (56) 

as desired. But also 

q = (Q 1/2 + z) (n£ + (i m - n)tj (57) 
= Q 1/2 | + z(n|+(i m -n)|) . (58) 
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The point is that, while maintaining Yar(q) = Q, and 
despite the reintroduction of dependence between the two 
terms in eqn. (|58|), the influence of £ has been confined 
to span(Z) = span(A)U The above reflections yield the 
following algorithm, labelled Sqrt-Dep: 

1. Perform the core square root update for Q, eqn. (|33|); 

2. Find E such that Qs^Z = D of eqn. (|34|). Cornpo- 
nents in the kernel of Q s ' are inconsequential; 

3. Sample E by drawing each column independently 
from A/”(0, l m ); 

4. Compute the residual noise, D. and add it to the 
ensemble anomalies; 

6 = z (nz + (i m - n)=). ( 59 ) 

Unfortunately, this algorithm requires the additional SVD 
of Q 1 / 2 in order to compute n and E. Also, despite the 
reintroduction of dependence, Sqrt-Dep is not fully con¬ 
sistent, as discussed in appendix B. 


7 Experimental set-up 

The model noise incorporation methods detailed in sec¬ 
tions and H are benchmarked using “twin experiments”, 
where a “truth” trajectory is generated and subsequently 
estimated by the ensemble DA systems. As indicated by 
eqns. (Q) and (§), stochastic noise is added to the truth 
trajectory and observations, respectively. As defined in 
eqn. ®,Q implicitly includes a scaling by the model time 
step, At, which is the duration between successive time 
indices. Observations are not taken at every time index, 
but after a duration, At 0 b s , called the DA window, which 
is a multiple of At. 

The noise realisations excepted, the observation pro¬ 
cess, eqn. (|j), given by H, R, and Af 0 b s , and the forecast 
process, eqn. ®, given by /, fi°, P l) and Q, are both 
perfectly known to the DA system. The analysis update 
is performed using the symmetric square root update of 
section |2| for all of the methods under comparison. Thus, 
the only difference between the ensemble DA systems is 
their model noise incorporation method. 

Performance is measured by the root-mean-square er¬ 
ror of the ensemble mean, given by: 


analysis update, the ensemble anomalies are rescaled by 
a scalar inflation factor intended to compensate for the 


consequences of sampling error in the analysis (e.g. An¬ 


derson and Anderson, 1999; Bocquet, 2011). This fac¬ 


tor, listed in Table |2|, was approximately optimally tuned 
prior to each experiment. In this tuning process the Add- 
Q method was used for the forecast noise incorporation, 
putting it at a slight advantage relative to the other meth¬ 
ods. 

In addition to the EnKF with different model incor¬ 
poration methods, the twin experiments are also run with 
the standard methods of Table [I] for comparison, as well 
as three further baselines: (a) the climatology, estimated 
from several long, free runs of the system, (b) 3D-Var 
(optimal interpolation) with the background from the cli¬ 
matology, and (c) the extended Kalman filter ( Rodgers , 
2QQCj ). 


7.1 The linear advection model 

The linear advection model evolves according to 

x* +1 = 0.98a}.! (61) 

for t = 0 ,..., i = 1 : m, with m = 1000, and periodic 
boundary conditions. The dissipative factor is there to 
counteract amplitude growth due to model noise. Direct 
observations of the truth are taken at p = 40 equidistant 
locations, with R = 0.01l p , every fifth time step. 

The initial ensemble members, { x ° | n = 1 : TV}, 
as well as the truth, x °, are generated as a sum of 25 
sinusoids of random amplitude and phase, 

i 25 

x °n = — ^2a*sm(2nk [i/m+y k n \) , (62) 

C n , 
k =1 

where off and ip!f is drawn independently and uniformly 
from the interval (0,1) for each n and k, and the normal¬ 
isation constant, c n , is such that the standard deviation 
of each is 1. Note that the spatial mean of each re¬ 
alisation of eqn. (|62|) is zero. The model noise is given 

by 


Q = 0.01 Var(a: 0 ). (63) 


RMSE = \l—\\x t - x 41,2 
m 


2 ’ 


(60) 


for a particular time index t. By convention, the RMSE 
is measured only immediately following each analysis up¬ 
date. In any case, there was little qualitative difference 
to “forecast” RMSE averages, which are measured right 
before the analysis update. The score is averaged for all 
analysis times after an initial transitory period whose du¬ 
ration is estimated beforehand by studying the RMSE 
time series. Each experiment is repeated 16 times with 
different initial random seeds. The empirical variances of 
the RMSEs are checked to ensure satisfying convergence. 

Covariance localisation is not used. Following each 


7.2 The Lorenz-96 model 

The Lorenz-96 model evolves according to 
dx ■ 

—A = ( x i+ i - Xi - 2 ) Xi -1 -Xi + F, 
at 


(64) 


for t > 0, and i = 1 : m, with periodic boundary con¬ 
ditions. It is a nonlinear, chaotic model that mimics the 
atmosphere at a ce rtain latitude circle. We use the pa¬ 
rameter settings of Lorenz and Emanuel ( 1998 ), with a 
system size of m = 40, a forcing of F = 8, and the fourth- 
order Runge-Kutta numerical time stepping scheme with 
a time step of At = 0.05. Unless otherwise stated, di- 
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Figure 2: Performance benchmarks as a function of the en¬ 
semble size, N, obtained with the linear advection system. 
The scale has been irregularly compressed for N > 60. 

rect observations of the entire state vector are taken a 
duration of Af 0 b s = 0.05 apart, with R = l m . 

The model noise is spatially homogeneous, generated 
using a Gaussian autocovariance function, 

Q i,j = exp ^—1/30||* - j\\fj + 0.1 Sij , (65) 

where the Kronecker delta, Sij, has been added for nu¬ 
merical stability issues. 

8 Experimental results 

Each figure contains the results from a set of experiments 
run for a range of some control variable. 



Figure 3: Performance benchmarks as a function of the en¬ 
semble size, N, obtained with the Lorenz-96 system. The 
climatology averages an RMSE of 3.69 for both figures. The 
scale has been irregularly compressed for IV > 40. 

the covariance matrices of Mult-1 is more harmful than 
the sampling error incurred by Add-Q. But, for 45 < 
N < 400, Mult-1 beats Add-Q. It is not clear why this 
reversal happens. 

Sqrt-Core performs quite similar to Mult-1. In the 
intermediate range, it is clearly deficient compared to the 
square root methods that account for residual noise, il¬ 
lustrating the importance of doing so. The performance 
of Sqrt-Dep is almost uniformly superior to all of the 
other methods. The only exception is around N = 25, 
where Add-Q slightly outperforms it. The computa¬ 
tionally cheaper Sqrt-Add-Z is beaten by Add-Q for 
N < 40, but has a surprisingly robust performance nev¬ 
ertheless. 


8.1 Linear advection 

Figure shows the RMSE versus the ensemble size for 
different model noise incorporation schemes. The maxi¬ 
mum wavenumber of eqn. ( |62| ) is k = 25. Thus, by the 
design of P° and Q, the dynamics will take place in a 
subspace of rank 50, even though rn = 1000. This is 
clearly reflected in the curves of the square root meth¬ 
ods, which all converge to the optimal performance of 
the Kalman filter (0.15) as N approaches 51, and Z goes 
to zero. Sqrt-Add-Z takes a little longer to converge 
because of numerical error. The multiplicative inflation 
curves are also constant for N > 51, but they do not 
achieve the same level of performance. As one would ex¬ 
pect, Add-Q also attains the performance of the Kalman 
filter as N —> oo. 

Interestingly, despite MuLT-m satisfying eqn. (^) to 
a higher degree than Mult-1, the latter performs dis¬ 
tinctly better across the whole range of N. This can likely 
be blamed on the fact that Mult-to has the adverse ef¬ 
fect of changing the subspace of the ensemble, though it 
is unclear why its worst performance occurs near N = 25. 

Add-Q clearly outperforms Mult- 1 in the interme¬ 
diate range of N, indicating that the loss of nuance in 


8.2 Lorenz-96 


Figure [l] shows the RMSE versus ensemble size. As with 
the linear advection model, the curves of the square root 
schemes are coincident when Z = 0, which here happens 
for N > m = 40. In contrast to the linear advection sys¬ 
tem, however, the square root methods still improve as 
N increases beyond m , and noticeably so until N = 60. 
This is because a larger enable is better able to charac¬ 
terise the non-Gaussianity of the distributions and the 
non-linearity of the models. On the other hand, the 
performance of the multiplicative inflation methods stag¬ 
nates around N = m, and even slightly deteriorates for 
larger N. T his can probably be att ributed to the effects 
observed by Sakov and Okt ( 2008b| ). 

Unlike the more ambiguous results of the linear ad¬ 
vection model, here Add-Q uniformly beats the multi¬ 
plicative inflation methods. Again, the importance of ac¬ 
counting for the residual noise is highlighted by the poor 
performance of Sqrt-Core for N < 40. However, even 
though Sqrt-Add-Z is biased, it outperforms Add-Q for 
N > 25, and approximately equals it for smaller N. 

The performance of Sqrt-Dep is nearly uniformly 
the best, the exception being at IV = 18, where it is 
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Figure 4: Performance benchmarks as a function of the 
data assimilation window, At 0 b s , obtained with the Lorenz-96 
model and N = 30. The climatology averages an RMSE of 
3.7. 


marginally beaten by Add-Q and Sqrt-Add-Z. The ex¬ 
istence of this occurrence can probably be attributed to 
the slight suboptimality discussed in Appendix B, as well 
as the advantage gained by Add-Q from using it to tune 
the analysis inflation. Note, though, that this region is 
hardly interesting, since results lie above the baseline of 
the extended KF. 

Add-Q asymptotically attains the performance of the 
square root methods. In fact, though it would have been 
imperceptible if added to Fig. |], experiments show that 
Add-Q beats Sqrt-Dep by an average RMSE difference 
of 0.005 at N = 800, as predicted in section ||. 

Figure |] shows the RMSE versus the DA window. The 
performance of Add-Q clearly deteriorates more than 
that of all of the deterministic methods as Af 0 b s in¬ 
creases. Indeed, the curves of Sqrt-Core and Add-Q 
cross at At 0 b s ~ 0.1, beyond which Sqrt-Core out¬ 
performs Add-Q. Sqrt-Core even gradually attains the 
performance of Sqrt-Add-Z, though this happens in a 
regime where all of the EnKF methods are beaten by 3D- 
Var. Again, however, Sqrt-Dep is uniformly superior, 
while Sqrt-Add-Z is uniformly the second best. Simi¬ 
lar tendencies were observed in experiments (not shown) 
with N = 25. 

Figure H shows the RMSE versus the amplitude of the 
noise. Towards the left, the curves converge to the same 
value as the noise approaches zero. At the higher end of 
the range, the curves of MuLT-m and Sqrt-Core are ap¬ 
proximately twice as steep as that of Sqrt-Dep. Again, 
Sqrt-Dep performs uniformly superior to the rest, with 
Sqrt-Add-Z performing second best. In contrasts, Add- 
Q performs worse than Mult-ttt for a noise strength 
multiplier smaller than 0.2, but better as the noise gets 
stronger. 


Figure 5: Performance benchmarks as a function of the noise 
strength, obtained with the Lorenz-96 model and N = 25. 
Both axes are logarithmic. On average, when Q is multiplied 
by 10~ 3 (resp. 10 -2 ,10 _1 ,10°, 10 1 ), the model noise makes up 
approximately 0.5 (resp. 4, 20, 70, 90) percent of the growth 
in the spread of the ensemble. The climatology averages an 
RMSE score of approximately 4. 


9 Summary and discussion 


The main effort of this study has been to extend the 
square root approach of the EnKF analysis step to the 
forecast step in order to account for model noise. Al¬ 
though the primary motivation is to eliminate the need 
for simulated, stochastic perturbations, the core method, 
Sqrt-Core, was also found to possess several other de¬ 
sirable properties, which it shares with the analysis square 
root update. In particular, a formal survey on these fea¬ 
tures revealed that the symmetric square root choice for 
the transform matrix can be beneficial in regards to dy¬ 
namical consistency. 

Yet, since it does not account for the residual noise, 
Sqrt-Core was found to be deficient in case the noise 
is strong and the dynamics relatively linear. In dealing 
with the residual noise, cursory experiments (not shown) 
suggested that an additive approach works better than a 
multiplicative approach, similar to the forgetting factor 
of the SEIK. This is likely a reflection of the relative per¬ 
formances of Add-Q and Mult-to, as well as the findings 
of Whitaker and lla.mil] ( 2012| ), which indicate that the 
additive approach is better suited to account for model 
error. Therefore, two additive techniques were proposed 
to complement Sqrt-Core, namely Sqrt-Add-Z and 
Sqrt-Dep. Adding simulated noise with no components 
in the ensemble subspace, Sqrt-Add-Z is computation¬ 
ally relatively cheap as well as intuitive. However, it was 
shown to yield biased covariance updates due to the pres¬ 
ence of cross terms. By reintroducing dependence be¬ 
tween the Sqrt-Core update and the sampled, residual 
noise, Sqrt-Dep remedies this deficiency at the cost of 
an additional SVD. 

The utility of the noise integration methods proposed 
will depend on the properties of the system under con¬ 
sideration. However, Sqrt-Dep was found to perform 


11 




























robustly (nearly uniformly) better than all of the other 
methods. Moreover, the computationally less expensive 
method Sqrt-Add-Z was also found to have robust per¬ 
formance. These findings are further supported by omit¬ 
ted experiments using fewer observations, larger observa¬ 
tion error, and different models. 


Future directions 


The model noise square root approach has shown signif¬ 
icant promise on low-order models, but has not yet been 
tested on realistic systems. It is also not clear how this 
approach performs with more realistic forms of model er¬ 
ror. 


As discussed in Appendix B, a more shrewd choice 
of Q 1 / 2 might improve Sqrt-Dep. This choice impacts 
E, but not the core method, as shown in Appendix A 
section AS, and should not be confused with the choice 
of . While the Cholesky factor yielded worse perfor¬ 
mance than the symmetric choice, other options should 
be contemp l ated. 

N akano 


proposed a method that is distinct, 
yet quite similar to Sqrt-Core, this should be explored 
further, in particular with regards to the residual noise. 
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A The residual noise 


Proof. Note that 

C v A = (l m - n A )Qn A e span(A) J_ , (68) 

Cv B = n A QttB e span(A). (69) 

As Cv = X [v A + vb], eqns. ( p8| ) and ( |69| ) imply that 

Cv A = Xv B , (70) 

Cv B = Xv A . (71) 

Therefore, 

Cu = Q[v a - v B \ = Xv B - Xv A = -A[v a - v B ] □ 
Equations (0) and (|7l|) can also be seen to imply (c). 

Corollary 1. 

trace(C) = 0. This follows from the fact that the trace of 
a matrix equals the sum of its eigenvalues. 

Corollary 2. 

||ua ||2 = II^bII^- This follows from the fact that v T u = 
(v A + v b ) t (v a — v B ) = v^v A — VgV B should be zero by 
the spectral theorem. 

Interestingly, imaginary, skew-symmetric matrices also 
have the property that their eigenvalues, all of which are 
real, come in positive/negative pairs. These matrices can 

m 2 

all be written M M for some M £ iR m , which is very 
reminiscent of C. However, it is not clear if these parallels 
can be used to prove Theorem || because M M 1 only 
has zeros on the diagonal, while C generally does not (by 
symmetry, it can be seen that this would imply C = 0). 
Also, Theorem H depends on the fact that the cross terms 
are “flanked” by orthogonal projection matrices, whereas 
there are no requirements on M. 


A.l The cross terms 

Let C be the sum of the two cross terms: 

c = Q 1 / 2 Z T + ZQ T / 2 ( 66 ) 

= n A Q(i m — n A ) + (i m — n A )Qn A . (67) 

Note that span(Q 1,/2 Z r ) C span(A) C ker(Q 1/,2 Z T ), and 
therefore Q 1 / 2 Z r (and its transpose) only has the eigen¬ 
value 0. Alternatively one can show that it is nilpotent 
of degree 2. By contrast, the nature of the eigenvalues of 
C is quite different. 

Theorem 3 - Properties of C. 

2 

The symmetry of C £ M. m implies, by the spectral theo¬ 
rem, that its spectrum is real. Suppose that X is a non¬ 
zero eigenvalue ofC , with eigenvector v = v A + v B , where 
v A = n A r and v B = (l m — l~l A )u. Then (a) u = v A — v B 
is also an eigenvector, (b) its eigenvalue is —A, and (c) 
neither v A nor v B are zero. 


A.2 The residual covariance matrix 

The residual, [Q-Q] , differs from the symmetric, positive 
matrix ZZ by the cross terms, C. The following theorem 
establishes a problematic consequence. 

Theorem 4 [Q Q] is not a covariance matrix. 
Provided C ^ 0, the residual ‘‘covariance” matrix, [Q Q], 
has negative eigenvalues. 

Proof. Since C is symmetric, and thus orthogonally di- 
agonalisable, the assumption that C / 0 implies that C 
has non-zero eigenvalues. Let v be the eigenvector of a 
non-zero eigenvalue, and write v = v A + v B , with v A £ 
span(A) and v B £ span(A)- 1 -. Then u T Cr = v A Qv B / 0. 
Define v a = v B + av A . Then: 

^[Q-Q]n a =^[ZZ T +C]n a (72) 

= v b Qv b + 2avjQv B . (73) 
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The second term can always be made negative, but larger 
in magnitude than the first, simply by choosing the sign 
of a and making it sufficiently large. □ 






A.3 Eliminating the cross terms 


Can the cross terms be entirely eliminated in some way? 
section 3.2 already answered this question in the nega¬ 
tive: there is no particular choice of the square root of 
Q, inducing a choice of Q 1 / 2 and Z through eqns. (ft5| ) 
and that eliminates the cross terms, C. 

But suppose we allow changing the ensemble sub¬ 
space. For example, suppose the partition Q 1 1 — Q 1 / 2 + 
Z uses the projector onto the N largest-eigenvalue eigen¬ 
vectors of Q instead of 11 a • It can then be shown that 
the cross terms are eliminated: Q 1 / 2 Z i = o, and hence 
C = 0 and Vai^g 21 ) = Q. A similar situation arises in 
the case of the COFFEE algorithm (section ||), explaining 
why it does not have the cross term problem. Another 
particular rank-fV square root that yields C = 0 is the 
lower-triangular Cholesky factor of Q with the last m — N 
columns set to zero. 

Unfortunately, for general Q and A, the ensemble sub¬ 
space will not be that of the rank-iV truncated Cholesky 
or eigenvalue subspace. Therefore neither of these options 
can be carried out using a right-multiplying square root. 


B Consistency of Sqrt-Dep 

Sqrt-Core ensures that eqn. ( |30| ) is satisfied, i.e. that 

^[A + D][A + D] t = P + Q, (74) 

where (N — 1)P = AA 1 . However, this does not imply 

/, » T 

that DD = (AT — 1)Q. Therefore, in reference to Sqrt- 

*.«.T ^ 

Dep, -- ^ l m . Instead, the magnitudes of D and — are 

minimised as much as possible, as per Theorem |^. 

However, Sqrt-Dep is designed assuming that E is 
stochastic, with its columns drawn independently from 
A/”(0, l m ). If this were the case, then Sqrt-Dep would be 
consistent in the sense of 

^e([A + D + D][A + D + D] t )=P+Q, (75) 


where the expectation is with respect to E and E. This 
follows from the consistency of q as defined in eqn. (j55|), 
which has Var(q) = Q, because each column of D = D + D 
is sampled in the same manner as q. 

The fact that D is in fact not stochastic, as Sqrt- 
Dep assumes, but typically of a much smaller magnitude, 
suggests a few possible venues for future improvement. 
For example we speculate that inflating E by a factor 
larger than one, possibly estimated in a similar fashion 
to Dee (1995). The value of E also depends on the choice 


of square root for Q 1 / 2 . It may therefore be a good idea 
to choose Q 1 / 2 somewhat randomly, so as to induce more 
randomness in the square root “noise”, E. One way of 
doing so is to apply a right-multiplying rotation matrix 
to Q 1 / 2 . Cursory experiments indicate that there may be 
improvements using either of the above two suggestions. 


C Left-multiplying formulation of 
Sqrt-Core 


Lemma 1. 

The row (and column) space o/T{ = (G^)s^ 2 is the row 
space of A. 

Proof. Let A = UZV be the SVD of A. Then: 

G ; =\ n + (N ~1)A + Q(A + ) t (76) 

= v(ijv + (iv-i)z + u T QU(r + ) T ) v T □ 


In view of Lemma [I] it seems reasonable that there 
should be a left-multiplying update, A 1 = LA, such that 
it equals the right-multiplying update, A 1 = AT{. Al¬ 
though IV < m in most applications of the EnKF, the 
left-multiplying update would be a lot less costly to com¬ 
pute than the right-multiplying one in such cases if N 
m. The following deri vation of an ex p licit fo rmula for L 
is very close to that of Sakov and Oke ( 2008b ), except for 
the addition of eqn. (|3(i|). Lemma [2| will also be of use. 


Lemma 2. 

For any matrices, A £ M mxN , M £ R m , and any positive 
integer, k, 


A(A T MA) k = (AA r M) fc A. (77) 


Theorem 5 - Left-multiplying transformation. 

For any ensemble anomaly matrix, A £ M. mxN , and any 
SPD matrix Q £ R m , 

AT{ = LA, (78) 

where 

T{ = (Lv + (iV —1)A+Q(A+) T ); /2 , (79) 

L = (l m + (N- 1)AA+Q(AA t )+) 1/2 . (80) 

In case N > m, eqn. (^) reduces to 

L= (l m + (7V-l)Q(AA T )- 1 ) 1/2 . (81) 


Note that (l m + AA + Q(AA T ) + ) is not a symmetric 
matrix. We can nevertheless define its square root as the 
square root obtained from its eigendecomposition, as was 
done for the symmetric square root in section 2.3. 


Proof. Assuming A + Q(A + ) T has eigenvalues less than 1, 
we can express the square root, (A+Q(A + ) T ) 1 / 2 , through 
its Taylor expansion ( polub and Van Loan| , |1996| , Th. 
9.1.2). Applying eqn. (j36[), followed by Lemma |2| with 
M = (AA t )+{N - l)Q(AA r ) + , and eqn. ( |36| ) the other 
way again, one obtains eqn. (|80|). 

If N > m, then rank(A) = m, unless the dynam¬ 
ics have made some of the anomalies collinear. Hence 
rank(AA T ) = m and so AA 1 is invertible, and AA + = I m . 
Thus, eqn. (|8(i| ) reduces to eqn. (|8H). □ 
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Note that the existence of a left-multiplying formula¬ 
tion of the right multiplying operation A H > AT;; could 
be used as a proof for Theorem |], because LAI = 0 by 
the definition ( |2£| ) of A. Finally, Theorem || provides an 
indirect formula for L. 

Theorem 6 - Indirect left-multiplying formula. 

If we have already calculated the right-multiplying trans¬ 
form matrix T{, then the we can obtain a corresponding 
left-multiplying matrix. L, from: 

L = AT{A + . (82) 

Proof. We need to show that LA = AT{. Note that A + A 
is the orthogonal (and hence symmetric) projector onto 
the row space of A, which Lemma [| showed is also the 
row and column space of T{. Therefore (A + A) = T{, 
and LA = AT{ (A+A) = AT{. □ 
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Table 2: Inflation factors used in benchmark experiments. 

Reads from left to right, corresponding to the abscissa of the 

plotted data series. 
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